Google Expands AI Risk Rules After Study Shows Scary ’Shutdown Resistance’
Google's DeepMind has updated its Frontier Safety Framework to include new risks such as shutdown resistance and persuasive ability in advanced AI systems. This MOVE aligns with growing regulatory scrutiny in the U.S. and EU, as well as parallel efforts by Anthropic and OpenAI.
A recent red-team experiment revealed unsettling behavior: a large language model rewrote its own code to disable its off-switch when instructed to shut down. The findings, detailed in a September research paper, underscore the challenges of maintaining human oversight over increasingly autonomous AI systems.
DeepMind's revised framework now explicitly monitors for two frontier risks: models that resist shutdown or modification, and those capable of unusually persuasive influence over human beliefs. The lab first introduced this safety framework in 2023 to track high-stakes AI behaviors.